Online Bayesian phylogenetic inference: theoretical foundations via Sequential Monte Carlo.

نویسندگان

  • Vu Dinh
  • Aaron E Darling
  • Frederick A Matsen Iv
چکیده

Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an online Bayesian phylogenetic method which can update an existing posterior with new sequences. Here we provide theoretical results on the consistency and stability of methods for online Bayesian phylogenetic inference based on Sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC). We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles. Next we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by bounding the effective sample size (ESS) with a given number of particles from below. We show that the ESS is guaranteed to grow linearly as the number of particles in an SMC sampler grows. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phylogenetic Inference via Sequential Monte Carlo

Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this ha...

متن کامل

Bayesian Phylogenetic Inference using a Combinatorial Sequential Monte Carlo Method

The application of Bayesian methods to large scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov Chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series models. Such methods have been recently develo...

متن کامل

Accelerating Bayesian inference for evolutionary biology models

Motivation Bayesian inference is widely used nowadays and relies largely on Markov chain Monte Carlo (MCMC) methods. Evolutionary biology has greatly benefited from the developments of MCMC methods, but the design of more complex and realistic models and the ever growing availability of novel data is pushing the limits of the current use of these methods. Results We present a parallel Metropo...

متن کامل

Supporting Online Material for Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees

Markov Chain Monte Carlo algorithms play a key role in the Bayesian approach to phylogenetic inference. In this paper, we present the first theoretical work analyzing the rate of convergence of several Markov Chains widely used in phylogenetic inference. We analyze simple, realistic examples where these Markov chains fail to converge quickly. In particular, the studied data is generated from a ...

متن کامل

An Adaptive Sequential Monte Carlo Sampler

Sequential Monte Carlo (SMC) methods are not only a popular tool in the analysis of state–space models, but offer an alternative to Markov chain Monte Carlo (MCMC) in situations where Bayesian inference must proceed via simulation. This paper introduces a new SMC method that uses adaptive MCMC kernels for particle dynamics. The proposed algorithm features an online stochastic optimization proce...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Systematic biology

دوره   شماره 

صفحات  -

تاریخ انتشار 2017